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Abstract 

We  consider  the  problem  of  color  image  quantization,  or  clustering  of  the  color  space.  We  pro¬ 
pose  a  new  methodology  for  doing  this,  called  model-based  clustering  trees.  This  is  grounded 
in  model-based  clustering,  which  bases  inference  on  finite  mixture  models  estimated  by  max¬ 
imum  likelihood  using  the  EM  algorithm,  and  automatically  chooses  the  number  of  clusters 
by  Bayesian  model  selection,  approximated  using  BIC,  the  Bayesian  Information  Criterion. 
We  build  a  clustering  tree  by  first  clustering  the  first  color  band,  then  using  the  second  color 
band  to  cluster  each  of  the  clusters  found  at  the  first  stage,  and  the  resulting  clusters  are 
then  further  subdivided  in  the  same  way  using  the  third  color  band.  The  tree  is  pruned 
automatically  as  part  of  the  algorithm  by  using  Bayesian  model  selection  to  choose  the  num¬ 
ber  of  clusters  at  each  stage.  An  efficient  algorithm  for  implementing  the  methodology  is 
proposed.  The  method  is  applied  to  several  real  data  sets  and  compared,  with  good  results, 
to  an  alternative  method  that  clusters  simultaneously  on  all  bands. 

Keywords:  Bayesian  modeling;  Clustering;  Color-space  quantization. 
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1  Introduction 


Clustering  and  segmentation  in  an  image  analysis  context  have  a  long  history  [23].  Objectives 
include:  quantization  of  data  values  for  later  use  with  a  codebook  in  a  compression  context; 
targeting  delivery  to  display  devices  supporting  small,  bounded  pixel  data  value  depth;  as  a 
preliminary  to  object  and  feature  detection  and  analysis  in  images;  and  as  a  basis  for  other 
image  processing  operations  such  as  image  registration  and  archiving. 

Color  quantization  (see,  e.g.,  [33])  is  unsupervised  classification,  or  clustering,  of  the 
color  space.  Typically  a  potential  2563  (for  l-byte  pixel  values  in  each  color  band)  distinct 
colors  are  to  be  mapped  into  a  reduced  number  of  palette  colors,  e.g.  up  to  256  [27].  Color 
image  quantization  could  be  said  to  be  brittle  because  the  distribution  of  local  optima  in  the 
3-dimensional  color  space  is  expected  to  be  broad  [28]. 

In  this  work  we  will  be  concerned  with  3-band  color  data.  However  our  results  will  carry 
over  to  multiple  band  data  (e.g.  in  Earth  observation),  or  to  multi-  or  hyperspectral  data  in 
any  domain.  Our  work  may  similarly  be  relevant  for  aspects  of  image  sequence  and  video 
analysis. 

A  short  review  of  software  for  quantization  is  as  follows.  In  the  image  display  and  analysis 
program  xv,  Bradley  [3]  sequentially  picks  colors  which  are  as  unlike  as  possible  to  all  already 
picked  colors.  RGB  space  is  used.  In  ImageMagick,  Cristy  [8]  determines  a  color  description 
tree,  associates  the  frequency  of  occurrence  with  each  node  in  this  tree,  and  a  quantization 
error  defined  in  RGB  space.  ImageMagick  then  prunes  the  tree  based  on  these  measures.  An 
algorithm  developed  by  Wan  et  al.  [35]  also  uses  hyperboxes  with  hyperplane  splits,  based 
on  a  sum  of  squares  error  criterion. 

In  this  paper,  we  propose  a  new  method  for  color  image  quantization,  called  model-based 
clustering  trees.  This  combines  maximum  likelihood  estimation  of  finite  mixture  models  with 
Bayesian  model  selection.  It  operates  recursively  on  the  three  color  bands  of  the  image.  First 
it  clusters  the  pixels  on  the  basis  of  the  first  band.  Then,  using  the  second  color  band,  it 
clusters  each  of  the  clusters  found  in  the  first  stage.  The  third  stage  further  subdivides  the 
resulting  clusters  on  the  basis  of  the  third  band.  Bayesian  model  selection  is  used  at  each 
stage  to  determine  the  number  of  clusters,  so  that  the  data  are  used  to  decide  adaptively 
the  extent  to  which  the  tree  is  pruned. 

The  resulting  method  allows  the  number  of  color  quantization  levels  to  be  chosen  on  the 
basis  of  the  data,  if  desired.  If  the  number  of  quantization  levels  is  predetermined,  however 
[24]  the  method  can  easily  handle  this  as  a  special  case.  Also,  a  natural  order  often  exists  on 
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color  band  dimensions,  e.g.,  chromaticities  convey  far  less  perceptual  information  than  does 
the  luminosity  (see,  e.g.,  [36]).  Such  an  order  can  be  readily  accommodated  in  our  approach. 
Experimental  results  are  presented  below  for  the  cases  of  an  order  on  color  bands,  and  the 
case  of  no  such  order. 

We  can  readily  accommodate  noise  in  our  image  data.  This  is  implied  by  image  features 
taken  as  realizations  of  distributional  models.  Explicit  noise  components  are  readily  incor¬ 
porated  into  our  modeling,  as  discussed  in  earlier  work  of  ours  [4],  Our  MR  software  package 
[20]  provides  color  image  noise  filtering,  together  with  compression,  functionality. 

We  can  accommodate  a  very  small  number  of  classes  (clusters  or  segments)  for  the  pixels, 
or  a  large  number.  A  small  number  of  classes  may  be  needed  as  a  preliminary  to  a  data 
interpretation,  or  high-level  vision  stage  of  the  analysis,  or  to  cater  for  4-bit  displays,  in 
which  case  optimal  16-level  quantization  is  targeted.  A  large  number  of  classes  may  be 
needed  when  high  fidelity  to  the  original  image  is  required,  by  close  approximation  to  8-bit 
or  greater  precision  of  pixel  data  in  each  color  band. 

In  Section  2  we  describe  the  model-based  clustering  trees  methodology.  In  Section  3  we 
discuss  aspects  of  algorithm  design  and  properties.  In  Section  4  we  show  how  the  algorithm 
performs  on  several  real  datasets,  and  compare  it  to  an  alternative  algorithm  that  clusters 
separately  (rather  than  recursively)  on  the  color  bands. 

2  Model-Based  Clustering  Trees 

Our  basic  framework  is  that  of  model-based  clustering ,  as  described,  for  example,  by  Fraley 
and  Raftery  [14,  15].  In  this  methodology,  a  finite  mixture  of  normal  distributions  is  fit  to 
the  data  by  maximum  likelihood  estimation  using  the  EM  algorithm,  the  number  of  groups 
is  chosen  using  Bayesian  model  selection,  and  if  hard  clustering  is  desired,  each  datum 
is  assigned  to  its  most  likely  group  a  posteriori.  Model-based  clustering  trees  produces  a 
clustering  of  multivariate  data  by  clustering  on  each  band  or  dimension  recursively. 

We  now  briefly  outline  finite  mixture  modeling,  Bayesian  model  selection,  and  model- 
based  clustering  trees. 

2.1  Univariate  Normal  Finite  Mixture  Models 

In  the  univariate  normal  finite  mixture  model,  one-dimensional  observations  x*  are  assumed 
to  be  drawn  from  G  groups,  each  of  which  is  normally  distributed.  The  g-ih  group  has  mean 
pLg  and  variance  a2.  Given  observations  x  =  (aq, . . . ,  xn),  let  7  be  an  unobserved  nxG  cluster 
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assignment  matrix,  where  7j9  =  1  if  X{  comes  from  the  g-ih  group,  and  7j9  =  0  otherwise. 

Our  goals  are  to  determine  the  number  of  clusters  G,  to  determine  the  cluster  assignment  of 
each  datum,  and  to  estimate  the  parameters  gg  and  ag  of  each  cluster. 

The  probability  density  for  this  model  is 

f(Xi\9,  A)  =  Xgfg(xi\0g ),  (!) 

9=1 

where  6g  =  (ggia'g)T,  fg(- \9g)  is  a  normal  density  with  mean  fj,g  and  variance  a|,  9  = 

(0i,  ■  ■  --0g),  and  A  =  (Ai, . . . ,  Ag)  is  a  vector  of  mixture  probabilities  such  that  Xg  >  0  (g  = 

1, . . .  ,G)  and  J2g=i  A9  =  1. 

We  estimate  the  parameters  by  maximum  likelihood  using  the  EM  (expectation-maximization) 
algorithm  [10,  19],  one  of  the  most  successful  methods  in  modern  statistics.  For  its  applica¬ 
tion  to  model-based  clustering,  see  [18,  6,  9].  This  is  a  procedure  for  iteratively  maximizing 
likelihoods  in  situations  where  there  are  unobserved  quantities  and  estimation  would  be  sim¬ 
ple  if  these  were  known.  In  the  clustering  case,  the  unobserved  quantities  are  the  cluster 
assignments  given  by  the  matrix  7. 

The  EM  algorithm  iterates  between  the  E  step  and  the  M  step.  In  the  E  step,  the 
conditional  expectation,  7,  of  7  given  the  data  and  the  current  estimates  of  9  and  A  is 
computed,  so  that  7 ig  is  the  conditional  probability  that  x,t  belongs  to  the  g-th  group.  In 
the  M  step,  conditional  maximum  likelihood  estimators  of  9  and  A  given  the  current  7  are 
computed. 

The  point  is  that  the  E  step  and  the  M  step  are  both  simple,  so  that  the  EM  algorithm 
as  a  whole  is  also  simple.  By  contrast,  direct  maximization  of  the  likelihood  for  the  mixture 
model  is  complex  in  general.  Although  the  EM  algorithm  has  some  limitations  (e.g.  it  is 
not  guaranteed  to  converge  to  a  global  rather  than  a  local  maximum  of  the  likelihood),  it  is 
generally  efficient  and  effective  for  Gaussian  clustering  problems.  The  EM  algorithm  requires 
a  starting  point.  An  initial  classification  for  each  possible  number  of  clusters  may  be  found 
by  agglomerative  hierarchical  model-based  clustering  [22,  2], 

This  procedure  is  especially  efficient  for  clustering  image  pixels  using  single  color  bands 
or  greyscale  images.  In  general  the  EM  algorithm  requires  0(n )  time,  where  n  is  the  number 
of  pixels.  However,  typically  pixels  can  have  one  of  only  a  limited  number,  £,  of  intensities 
in  each  band,  such  as  256.  If  we  first  summarize  the  data  by  the  counts  of  the  numbers  of 
pixels  with  each  intensity  level,  the  EM  algorithm  becomes  an  0(1)  algorithm  rather  than 
an  0(n )  one.  Since  n  is  often  on  the  order  of  65,000  or  more,  and  i  is  typically  256,  this 
is  a  major  speed-up  and  provides  a  compelling  reason  for  clustering  one  band  at  a  time  if 
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this  can  be  done  without  degrading  performance  too  much.  This  is  the  motivation  for  the 
model-based  clustering  trees  methodology  described  in  Section  2.3. 


2.2  Choosing  the  Number  of  Clusters  via  Bayesian  Model  Selec¬ 
tion 


We  use  Bayesian  model  selection  to  choose  the  number  of  clusters.  For  review  of  Bayesian 
model  selection,  see  Kass  and  Raftery  [16]  and  Raftery  [25].  Pioneering  work  in  this  area 
was  due  to  H.  Jeffreys,  I.J.  Good  and  (according  to  the  latter)  A.  Turing. 

We  consider  a  range  of  candidate  numbers  of  clusters,  G  =  Gmin, . . . ,  Gmax.  Each  possible 
number  of  clusters,  G,  implies  a  different  statistical  model  for  the  data,  MG.  The  model  MG 
has  a  vector  of  unknown  parameters,  ipG,  consisting  of  the  G  means,  the  G  variances,  and  the 
(G  —  1)  independently  estimated  mixture  probabilities:  (3G  — 1)  parameters  in  all.  Our  prior 
model  probabilities  are  p(MG )  for  G  =  Gmjn, . . . ,  Gmax,  where  Y^G=GmiIiP(MG)  =  1-  Often 
each  number  of  clusters  considered  is  taken  to  be  equally  likely  a  priori ,  so  that  p(MG)  = 
l/(Gmax  —  Gmjn  +  1)  for  each  G.  The  model  parameters  ip G  also  have  prior  distributions 
p(ipG\MG),  which  are  typically  rather  diffuse  and  do  not  affect  the  final  conclusions  unduly. 
The  data  produce  posterior  model  probabilities,  p(MG\x ),  where  again  Z)G=Gmil  p(AfG|a;)  =  1. 

By  Bayes’  theorem, 


P{MG \x) 


p(x\MG)p(MG) 

T,GH=G^p{x\MH)p{MHy 


G  Gmin, . . . ,  GE 


(2) 


In  (2),  p(x\MG)  is  the  integrated  likelihood  of  model  MG,  which  requires  integration  over  the 
model’s  parameter  space,  as  follows: 


p(x\MG)  =  j p(x\ipG ,  MG)p(ipG\MG)dipG,  (3) 

by  the  law  of  total  probability. 

The  integral  (3)  is  intractable  analytically  and  is  not  easy  to  evaluate.  However,  twice 
the  logarithm  of  the  integrated  likelihood  can  be  approximated  by  the  Bayesian  Information 
Criterion,  or  BIC: 


2  \og p(x\MG)  ~  2  \ogp(x\ipG,  Mg)  —  (3G  —  1)  log  n 
=  BIC 


(4) 


(Schwarz  1978;  Kass  and  Wasserman  1995;  Raftery  1995).  In  (4), 

n  C 

p{x\ipG,MG )  =  I[T,Kfa(xi\^) 

i=  1  9=1 
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is  the  maximized  likelihood.  In  words,  BIC  =  2(log  maximized  likelihood)  +  (log  n)  (number 
of  parameters).  The  BIC  measures  the  balance  between  the  improvement  in  the  likelihood 
and  the  number  of  model  parameters  needed  to  achieve  that  likelihood.  While  the  absolute 
value  of  the  BIC  is  not  informative,  differences  between  the  BIC  values  for  two  competing 
models  provide  estimates  of  the  evidence  in  the  data  for  one  model  against  another.  The  use 
of  the  BIC  in  choosing  clusters  in  a  mixture  or  clustering  model  is  discussed  by  Roeder  and 
Wasserman  [26]  and  Dasgupta  and  Raftery  [9].  Applications  are  in  Campbell  et  al.  [4,  5], 
Mukherjee  et  al.  [21]  and  in  other  articles. 

2.3  Model-Based  Clustering  Trees  Algorithm 

For  color  image  segmentation,  we  cluster  the  first  band,  then  we  cluster  each  of  the  resulting- 
clusters  using  the  second  band,  and  finally  we  subdivide  the  resulting  clusters  yet  again  using 
the  third  band.  This  requires  choosing  the  3D  color  space  and  the  order  in  which  the  bands 
are  to  be  clustered;  these  issues  are  discussed  in  Section  3. 

The  algorithm  can  be  summarized  as  follows: 

1.  For  the  first  color  band,  use  BIC  to  choose  the  number  of  clusters  (here  we  use  Gmin  =  1, 
Gmax  =  9).  Use  the  EM  algorithm  to  estimate  the  parameters  of  the  mixture  model, 
and  assign  each  pixel  to  the  group  to  which  it  is  most  likely  to  belong  a  posteriori 

2.  For  each  cluster  identified  in  step  1,  carry  out  a  separate  model-based  cluster  analy¬ 
sis,  this  time  using  only  the  pixel  intensities  in  the  second  color  band.  Each  cluster 
identified  in  step  1  is  then  itself  subdivided  into  several  clusters. 

3.  For  each  (sub)cluster  identified  in  step  2,  subdivide  it  further  using  the  same  procedure 
as  in  step  2,  but  this  time  using  only  the  pixel  intensities  in  the  third  color  band. 

The  univariate  normal  mixture  model  fitting  was  carried  out  using  an  algorithm  initially 
developed  by  Stanford  [30].  In  implementing  the  algorithm,  numeric  sub-segment  mean 
values  are  down-scaled  by  a  factor  of  10  (i.e.,  cluster  labels  expressed  as  a  sequence  number 
1,  2,  ...  ,  and  subsequently  divided  by  10)  for  level  2  sub-segments,  and  by  a  factor  of  100 
(i.e.,  cluster  labels  expressed  as  a  sequence  number  1,  2,  ...  ,  and  subsequently  divided  by 
100)  for  level  3  sub-segments.  Given  the  linear  order  properties  explored  in  the  next  section, 
this  facilitates  addition  of  segment  and  sub-segment  means.  It  is  clear  that  this  algorithm 
provides  an  adaptive  stepwise  procedure  for  pruning  a  segment  tree  with  an  upper  limit  of 
93  =  729  segments. 
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Timing  tests  were  carried  out  in  order  to  look  at  scaling  with  respect  to  number  of 
clusters,  and  image  dimensions.  Scaling  with  respect  to  number  of  clusters  is  approximately 
linear.  For  3,  9,  15,  21,  27,  33  and  39  clusters  from  a  768  x512  array,  the  compute  times  were, 
respectively,  20,  40,  53,  69,  104,  108,  131  seconds.  Scaling  with  input  image  dimensionality 
was  as  follows,  where  floating  point  storage  was  used  for  input  images,  and  10  clusters  were 
found  in  each  case:  (300  x  300),  11  seconds;  (600  x  600),  35  seconds;  (900  x  900),  87 
seconds.  This  is  better  than  linear  in  this  dimensionality  region. 

3  Algorithm  Design 

3.1  Algorithm  Properties 

In  this  section,  we  review  results  for: 

1.  Band  segmentation,  and  spectral  clustering. 

2.  Label  monotonicity. 

3.  Partial  order  of  clusters,  and  total  order  of  cluster  labels. 

Contiguity  of  pixels  associated  with  a  cluster  implies  that  segmentation  rather  than 
clustering  holds.  In  the  spatial  or  image  dimension  space  domain,  our  algorithm  provides  a 
clustering  solution.  In  color  or  spectral  band  space,  our  algorithm  provides  a  segmentation 
solution.  It  is  clear  that  the  latter  allows  for  straightforward  inducing  of  a  total  order  relation 
on  the  image  pixels,  which  is  necessary  for  display  as  a  single  image  plane. 

In  the  following  we  will  discuss  further  properties  of  our  algorithm  related  to  segmentation 
in  the  spectral  domain. 

To  facilitate  image  display,  we  posit  the  following  label  monotonicity  principle. 

Label  monotonicity  principle: 

For  clusters  c*  of  means  m(c,),  and  labels  /(c,)  we  require:  /(q)  <  l{cj)  -x=r-  m(c*)  -<  m(cj ) 
for  all  i  and  j. 

The  relation  -<  is  a  total  order  on  the  multivariate  means. 

Such  a  monotonicity  principle  is  desirable  since  the  relation  -<  can  be  mapped  onto  a 
color  lookup  table,  allowing  the  labels  to  be  mapped  monotonically. 
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The  mixture  modeling  algorithm  is  a  segmentation  algorithm  since  it  is  one-dimensional 
and  we  have  the  following  exact  relationship  for  any  given  spatial  dimension  or  color  band: 
l(ci)  <  l(cj )  4=>-  m(cj)  <  m(cj). 

Next,  consider  clusters  defined  from  band  or  color  2,  c2,,  and  we  will  write  cu  for  a  cluster 
defined  from  band  1.  For  given  c^,  we  have  cu  =  Ujej{C2j  where  the  cardinality  of  band  2 
cluster  set,  Jj,  is  at  least  1.  When  |  Jj  |=  1  we  ignore  any  contribution  by  band  2,  i.e.,  no 
sub-clusters  have  been  found. 

A  fortiori  it  holds  that  l(c2i)  <  /(c2j)  <=>  ra(c2j )  <  m(c2j)  for  all  band  2  segments. 

Similar  properties  hold  for  band  3.  We  have  c u  =  U jeJic2j  for  a  set  of  subclusters,  Jj, 
and  c2j'  =  \Jjieji.C2j>  for  a  set  of  subclusters,  J[,  which  also  is  at  least  of  cardinality  1  and  at 
most  of  cardinality  |  c2 y  |.  Again  we  have  a  total  order  on  all  band  3  segments. 

Our  algorithm  produces  an  embedded  set  of  segments.  Is  there  a  total  order  on  segment 
means  resulting  from  this  embedded  set,  so  that  the  principle  of  label  monotonicity  is  re¬ 
spected?  To  show  that  this  is  indeed  the  case,  we  firstly  note  that  band  3  segments  replace 
a  parent  band  2  segment,  and  band  2  segments  replace  a  parent  band  1  segment. 

Secondly,  at  band  2,  given  cu  =  UjejtC2J:  we  can  ensure  that  max?e2j  <  Ci(j+i)  since  by 
the  total  order  on  band  2  and  on  band  1  segments,  we  have  a  well-defined  maximum  value 
of  C2j,  and  similarly  Ciq+q  >  Cij  is  next  in  band  1  sequence. 

These  two  properties  ensure  that  there  results  a  total  order  of  segment  means.  Hence  it 
follows  that  the  label  monotonicity  principle  is  respected. 

3.2  Unconstrained  Band  Ordering 

In  this  section,  we  follow  closely  Tate  [31]  who  considers  the  band  ordering  problem  for 
compression  of  multispectral  images. 

We  consider  the  problem  of  clustering  on  one  band  coordinate,  assuming  that  this  band 
presents  good  clustering  properties,  followed  by  clustering  on  a  second  band  based  on  the 
first  band  clustering,  and  so  on.  As  expressed  in  the  previous  section,  if  cu  is  the  ith  cluster 
from  the  first  band,  then  we  seek  clusters  c2j  such  that  cu  =  Ujejic2j1  i.e.,  Jj  is  a  partition 
of  cluster  cij.  Similarly  we  proceed  to  a  third  band,  based  on  available  results  for  bands  1 
and  2. 

The  order  in  which  we  consider  the  bands  is  evidently  important.  Let  us  define  goodness 
of  clustering  as  the  tightness  of  the  clusters,  i.e.  the  minimum  sum  of  variances  of  clusters 
for  all  bands. 

Clearly  a  result  of  this  definition  is  that  when  we  find  no  subclustering  at  bands  2  and 
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3,  the  corresponding  subcluster  variances  are  equal  to  0,  and  hence  the  contribution  to  the 
overall  sum  of  variances  is  thereby  minimized. 

Finding  the  optimal  band  ordering  for  clustering  is  not  difficult  in  the  color  space  case, 
since  it  involves  only  6  possible  alternatives.  In  the  case  of  hyperspectral  data,  with  many 
dozens  or  hundreds  of  bands,  the  situation  is  more  computationally  demanding. 

Define  graph  G  =  (V,  E )  such  that  an  edge  E,tJ  has  weight  Wij  representing  the  added 
clustering  quality  attainable  by  clustering  band  i  before  band  j.  By  design,  band  i  is  parti¬ 
tioned,  and  the  clusters  of  the  i-partition  are  each  partitioned  based  on  band  j  information. 
Define  as  the  improvement  in  the  overall  sum  of  posteriors  (or  sum  of  intra-cluster  vari¬ 
ances:  the  criterion  used  to  quantify  clustering  quality  is  not  important  here)  by  taking  band 
i  before  band  j. 

The  problem  of  finding  an  optimal  band  order  in  the  general  case  of  many  bands  is 
equivalent  to  finding  an  optimal  traveling  salesman  path  in  the  graph,  G.  This  Hamiltonian 
path  problem  in  NP-hard  and  the  corresponding  decision  problem  of  knowing  whether  we 
have  or  do  not  have  a  Hamiltonian  path  in  G  is  NP-complete. 

3.3  Transformations 

In  this  section,  we  investigate  image  transformation  band  ordering 

RGB  coordinates  are  converted  to  luminance-hue-saturation  YIQ  using  the  following 
transformation. 


(Y\ 

(  0.30 

0.59 

0.11  \ 

I 

— 

0.60 

-0.28 

-0.32 

\QJ 

\  0.21 

-0.52 

0.31  ) 

(5) 


Cheng  et  al.  [7]  present  a  useful  overview  of  color  space  transformations  (unfortunately 
marred  by  many  typographical  errors  including  in  the  definition  of  the  above  transformation.) 

Luminance  alone  provides  a  best  monochrome  approximation.  The  I  and  Q  chrominance 
components  can  be  significantly  limited  (e.g.,  using  decimation)  without  noticeable  image 
degradation.  This  transformation  therefore  provides  for  an  order  on  the  image  bands.  So 
also  does  an  eigen- analysis  of  the  multiband  image.  Finally,  we  investigate  also  a  color 
enhancement  procedure  proposed  in  Toet  [32]:  perform  sigma-clipping  on  each  color  band, 
followed  by  color  saturation: 
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Color  space,  order 

Number  of  quantization  levels 

RGB 

9,  73,  125 

GBR 

8,  69,  240 

RBG 

9,  83,  214 

YIQ 

9,  77,  218 

PCI,  PC2,  PC3 
of  RGB 

9,  78,  180 

RGB  clipped, 
saturated 

9,  73,  208 

Table  1:  Cardinalities  of  embedded  partitions  for  successive  bands,  1,  2  and  3.  Sail  data. 


where  Ci 


(  R1 
G' 
\B' 

max{  R,G,B  },  and  C2 


Cx 

C1-c2 


min{  R,G,B  }. 


(6) 


4  Examples 

Test  data  sets  from  [17]  were  used.  These  included  the  sail  of  dimensions  3,  768,  512; 
tulips  of  dimensions  3,  767,  512;  and  f  rymire  images. 

4.1  Fitting  Model-Based  Clustering  Trees 

We  begin  by  partitioning  a  first  band,  and  then  continuing  with  the  partitioning  of  previous 
band  clusters.  By  constraining  the  mixture  model  to  9  image  marginal  Gaussian  components, 
this  yields  a  tree  of  Gaussian  components  of,  at  most,  93  =  729  nodes.  The  BIC  model 
assessment  criterion  will  tell  us  if  we  can  justifiably  retain  a  pruned  version  of  this  tree  of 
Gaussians. 

Using  the  sails  data,  the  numbers  of  quantization  levels  found  for  bands  1,  2  and  3 
are  shown  in  Table  1.  Of  note  is  that  there  is  no  experimental  advantage  in  the  use  of 
band  ordering  such  as  through  use  of  YIQ  color  space,  or  eigen-images  (principal  component 
images,  PCI,  PC2,  PC3).  In  all  cases,  increased  numbers  of  clusters  entails  corresponding 
improvements  in  visual  quality,  and  generally  in  global  measures  such  as  mean  square  error 
(MSE)  relative  to  input  data.  MSE  values  are  given  in  the  next  subsection. 
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4.2  Comparison  with  ImageMagick:  Separable  Band  Quantiza¬ 
tion 

We  now  turn  to  look,  for  comparison,  at  an  alternative  method  based  on  quantization  carried 
out  independently  on  the  color  bands,  leading  to  a  small  number  of  quantization  levels 
on  each.  The  model-based  clustering  trees  algorithm  takes  a  color  band,  and  determines 
the  best  quantization.  Then  in  a  stepwise  procedure,  further  bands  are  taken  and  already 
existing  quantized  bands  are  further  quantized.  Our  algorithm  is  therefore  a  progressive 
quantization  one.  In  ImageMagick,  by  contrast,  the  quantization  algorithm  of  Cristy  [8] 
carries  out  quantization  directly  in  the  color  space.  Such  full  multivariate  quatization  ought 
to  be  superior  but,  as  we  will  see,  gives  results  which  are  less  acceptable. 

We  consider  the  RGB  sail  example.  Our  algorithm  produced  9  quantized  levels  on 
the  first  band,  73  on  the  second,  and  125  on  the  third  (cf.  Table  1).  This  result  is  a 
progressive  one,  with  an  approximation  to  the  original  image  based  on  9  quantized  levels, 
a  better  approximation  to  the  original  data  based  on  73  quantized  levels,  and  finally  an 
approximation  to  the  original  data  based  on  125  quantized  levels. 

ImageMagick’s  quantize  procedure  was  given  125  quantization  levels  as  the  target,  and 
produced  91,  89  and  94  quantization  levels  on  bands  1,  2  and  3,  respectively  (i.e.,  R,  G,  and 
B  bands). 

Mean  square  error  (MSE)  discrepancy  between  the  original  data  and  the  quantized  data 
closely  reflects  the  number  of  quantization  levels  used  in  the  approximation.  For  band  1, 
MSEs  for  our  algorithm  (9  quantization  levels)  and  Cristy’s  (91  quantization  levels)  were 
124.95  and  99.54.  For  band  2,  MSEs  for  our  algorithm  (73  quantization  levels)  and  Cristy’s 
(89  quantization  levels)  were  127.66  and  127.54.  For  band  3,  MSEs  for  our  algorithm  (125 
quantization  levels)  and  Cristy’s  (94  quantization  levels)  were  126.28  and  135.29.  Histogram 
equalization  was  used  prior  to  these  calculations,  which  was  necessary  for  rescaling  given 
that  our  coding  involved  0.1  and  0.01  color  step-size  increments  for  bands  2  and  3. 

Considering  the  RGB  tulips  example,  we  find  for  our  method  9,  72,  and  237  progressively 
quantized  bands.  For  Cristy  [8],  we  find  113,  133,  and  133  quantized  levels  for  the  three 
bands,  respectively.  The  user  parameter  yielding  this  result  was  237  quantization  levels. 

Qualitatively,  the  results  are  very  different.  Figures  1,  2  and  3  show  the  tulips  image. 

Figure  1  is  a  greyscale  version  of  the  768  x  512  image. 

Figure  2  results  from  applying  our  algorithm  to  the  RGB  data  volume.  For  comparability 
with  the  ImageMagick  Cristy  [8]  result,  we  took  a  9  quantization  level  result  for  the  R  band, 
a  9  quantization  level  result  for  the  G  band,  and  an  8  quantization  level  result  for  the  B 
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band.  (The  R  band  quantization  levels  were  tested  using  BIC  against  all  smaller  numbers 
of  levels,  and  the  G  and  B  bands  were  tested  for  8  and  9  levels.) 

Figure  3  results  from  the  Cristy  ImageMagick  quantization  procedure.  In  the  latter,  9 
quantization  levels  were  requested,  yielding  an  image  with  8,  9,  and  8  quantization  levels 
for  the  three  color  bands.  Taking  the  R  band  alone  of  the  ImageMagick  result,  the  good  fit 
to  the  input  data  (MSE  90.13  for  Cristy  as  opposed  to  128.54  for  our  algorithm,  R  band 
in  both  cases)  was  explained  by  a  somewhat  crude  quantization  rendition  of  the  image.  A 
better  rendition  of  the  image  is  obtained  by  straightforwardly  converting  the  quantized  3D 
data  volume,  with  8,  9,  and  8  quantization  levels,  to  greyscale.  This  is  shown  in  Figure  3. 

Even  allowing  for  the  degradation  inherent  in  down-sizing  and  converting  to  greyscale 
Postscript,  we  find,  with  reference  to  the  original  data  in  Figure  1,  that  the  output  of  our 
algorithm  in  Figure  2  is  superior  to  the  ImageMagick  result  in  Figure  3.  The  ImageMagick 
result  is  however  faster  than  our  current  implementation.  On  a  Sun  Sparc  10  workstation, 
it  takes  approximately  10  seconds  for  the  tulips  data,  compared  to  40  seconds  for  each  of 
the  three  color  bands  in  the  case  of  our  algorithm. 

Figures  4,  5  and  6  show  the  sails  data.  Considerable  visual  detail  retained  by  our 
algorithm  (Figure  5)  is  lost  in  the  ImageMagick  quantization  (Figure  6). 

5  Discussion 

We  have  shown  how  a  Bayesian  modeling  approach,  model-based  clustering  trees,  can  lead 
to  good  results  in  the  area  of  color  image  clustering.  Formal  underpinnings  for  such  an 
algorithm  facilitate  choice  of  system  parameters  (e.g.  number  of  clusters)  which  in  a  general 
setting  would  be  set  arbitrarily. 

The  fitting  of  a  tree  of  Gaussian  components  may  be  of  benefit  in  the  exploration  of 
general  parameter  spaces,  since  we  are  not  overly  dependent  on  an  analysis  function  or 
kernel  (here:  Gaussian)  of  given  morphology.  The  partial  order  resulting  from  the  tree  of 
Gaussian  components  can  (at  least  in  principle)  accommodate  arbitrary  alignments  or  curves 
in  multidimensional  clusters.  Djorgovski  et  al.  [11]  describe  a  burgeoning  need  for  solving- 
such  problems. 

The  two  algorithms  studied  in  this  work  -  based  on  fitting  of  a  tree  of  Gaussians,  and 
separable  clustering  -  are  of  potential  interest  also  for  other  multiple  band,  and  indeed 
general  multidimensional,  data  sets.  Future  work  will  be  on  these  themes. 

One  possible  improvement  on  the  work  would  be  to  prune  the  tree  further  using  hierar- 
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Figure  1:  Tulips  test  image,  original  data.  The  original  is  in  color,  and  this  is  shown  here 
in  greyscale. 

chical  model-based  clustering  [2],  Here  one  would  start  with  the  partition  resulting  from  our 
method,  treat  that  as  an  initial  partition,  and  perform  agglomerative  hierarchical  clustering. 
One  would  use  the  loglikelihood  as  the  clustering  criterion  for  choosing  the  groups  to  be 
merged  at  each  stage,  and  BIC  as  the  criterion  for  deciding  when  to  stop  merging.  This 
could  yield  a  result  with  fewer  quantization  levels,  but  without  degrading  the  quality  of  the 
quantization. 

An  alternative  approach  is  to  carry  out  model-based  clustering  using  mixtures  of  mul¬ 
tivariate  normal  distributions  directly  on  the  three-dimensional  pixel  intensities.  This  was 
done  for  MRI  images  [12],  using  images  with  more  than  three  bands.  It  is  highly  time- 
consuming  however,  and  in  the  MRI  work  it  was  feasible  only  because  many  pixels  could  be 
excluded  a  priori.  Our  approach  is  much  more  efficient  computationally  and,  from  the  ex¬ 
amples  described,  captures  most  forms  of  multivariate  clustering  in  color  space  very  well.  It 
is  possible  to  construct  situations  where  our  method  would  perform  much  less  well  than  full 
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Figure  2:  Quantization  using  our  algorithm.  Number  of  quantization  levels  in  R,  G  and  B: 
9,  9,  8.  The  original  is  in  color,  and  this  is  shown  here  in  greyscale. 

multivariate  model-based  clustering,  but  we  have  not  encountered  such  situations  in  prac¬ 
tice,  and  they  would  seem  rather  contrived.  Nevertheless,  if  it  were  possible  to  implement 
full  multivariate  model-based  clustering  on  large  numbers  of  pixels  in  comparable  compute 
time  to  our  method,  this  would  seem  worth  doing,  but  this  has  not  yet  been  accomplished. 
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